Second and third place go to Google’s Gemini 3 Pro at 2.9% and Mistral’s Voxtral Small at 3.0%, respectively. Other strong performers include Google Gemini 3 Flash at 3.1% and ElevenLabs Scribe v1 at 3.2%. In the middle of the pack are models such as OpenAI’s GPT-4o Transcribe at 4.0% and Whisper Large v3 at 4.2%. Toward the lower end of the ranking are Alibaba’s Qwen3 ASR Flash at 5.9%, Amazon Nova 2 Omni at 6.0%, and Rev AI at 6.1%.

ElevenLabs Scribe v2 leads the overall AA-WER v2.0 benchmark ranking with the lowest word error rate, followed by Google Gemini 3 Pro and Mistral Voxtral Small. | Image: Artificial Analysis
ElevenLabs Scribe v2 leads the overall AA-WER v2.0 benchmark ranking with the lowest word error rate, followed by Google Gemini 3 Pro and Mistral Voxtral Small. | Image: Artificial Analysis 

In a separate benchmark focused specifically on speech directed at voice assistants, the overall picture remains largely the same. Scribe v2 again leads with a word error rate of 1.6%, followed closely by Gemini 3 Pro at 1.7%. AssemblyAI’s Universal-3 Pro ranks third with 2.3%.

In the AA-AgentTalk test for speech on voice assistants, Scribe v2 from ElevenLabs and Gemini 3 Pro from Google also dominate with the lowest error rates. | Image: Artificial Analysis
In the AA-AgentTalk test for speech on voice assistants, Scribe v2 from ElevenLabs and Gemini 3 Pro from Google also dominate with the lowest error rates. | Image: Artificial Analysis